NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A systematic evaluation of the language-of-viral-escape model using multiple machine learning frameworks

https://doi.org/10.1098/rsif.2024.0598

Allman, Brent E; Vieira, Luiz; Diaz, Daniel J; Wilke, Claus O (April 2025, Journal of The Royal Society Interface)

Predicting the evolutionary patterns of emerging and endemic viruses is key for mitigating their spread. In particular, it is critical to rapidly identify mutations with the potential for immune escape or increased disease burden. Knowing which circulating mutations pose a concern can inform treatment or mitigation strategies such as alternative vaccines or targeted social distancing. In 2021, Hie B, Zhong ED, Berger B, Bryson B. 2021 Learning the language of viral evolution and escape.Science371, 284–288. (doi:10.1126/science.abd7331) proposed that variants of concern can be identified using two quantities extracted from protein language models, grammaticality and semantic change. These quantities are defined by analogy to concepts from natural language processing. Grammaticality is intended to be a measure of whether a variant viral protein is viable, and semantic change is intended to be a measure of potential for immune escape. Here, we systematically test this hypothesis, taking advantage of several high-throughput datasets that have become available, and also comparing this model with several more recently published machine learning models. We find that grammaticality can be a measure of protein viability, though methods that are trained explicitly to predict mutational effects appear to be more effective. By contrast, we do not find compelling evidence that semantic change is a useful tool for identifying immune escape mutations.
more » « less
Full Text Available
Machine Learning Guided Rational Design of a Non‐Heme Iron‐Based Lysine Dioxygenase Improves its Total Turnover Number

https://doi.org/10.1002/cbic.202400495

Hunter_Wilson, R; Diaz, Daniel J; Damodaran, Anoop R; Bhagi‐Damodaran, Ambika (December 2024, ChemBioChem)

Abstract Highly selective C−H functionalization remains an ongoing challenge in organic synthetic methodologies. Biocatalysts are robust tools for achieving these difficult chemical transformations. Biocatalyst engineering has often required directed evolution or structure‐based rational design campaigns to improve their activities. In recent years, machine learning has been integrated into these workflows to improve the discovery of beneficial enzyme variants. In this work, we combine a structure‐based self‐supervised machine learning framework, MutComputeX, with classical molecular dynamics simulations to down select mutations for rational design of a non‐heme iron‐dependent lysine dioxygenase, LDO. This approach consistently resulted in functional LDO mutants and circumvents the need for extensive study of mutational activity before‐hand. Our rationally designed single mutants purified with up to 2‐fold higher expression yields than WT and displayed higher total turnover numbers (TTN). Combining five such single mutations into a pentamutant variant, LPNYI LDO, leads to a 40 % improvement in the TTN (218±3) as compared to WT LDO (TTN=160±2). Overall, this work offers a low‐barrier approach for those seeking to synergize machine learning algorithms with pre‐existing protein engineering strategies.
more » « less
Full Text Available
Distilling Structural Representations into Protein Sequence Models

https://doi.org/10.1101/2024.11.08.622579

Ouyang-Zhang, Jeffrey; Gong, Chengyue; Zhao, Yue; Krähenbühl, Philipp; Klivans, Adam R; Diaz, Daniel J (November 2024, bioRxiv)

Abstract Protein language models, like the popular ESM2, are widely used tools for extracting evolution-based protein representations and have achieved significant success on downstream biological tasks. Representations based on sequence and structure models, however, show significant performance differences depending on the downstream task. A major open problem is to obtain representations that best capture both the evolutionary and structural properties of proteins in general. Here we introduceImplicitStructureModel(ISM), a sequence-only input model with structurally-enriched representations that outperforms state-of-the-art sequence models on several well-studied benchmarks including mutation stability assessment and structure prediction. Our key innovations are a microenvironment-based autoencoder for generating structure tokens and a self-supervised training objective that distills these tokens into ESM2’s pre-trained model. We have madeISM’s structure-enriched weights easily available: integrating ISM into any application using ESM2 requires changing only a single line of code. Our code is available athttps://github.com/jozhang97/ISM.
more » « less
Full Text Available
Stability Oracle: a structure-based graph-transformer framework for identifying stabilizing mutations

https://doi.org/10.1038/s41467-024-49780-2

Diaz, Daniel J; Gong, Chengyue; Ouyang-Zhang, Jeffrey; Loy, James M; Wells, Jordan; Yang, David; Ellington, Andrew D; Dimakis, Alexandros G; Klivans, Adam R (December 2024, Nature Communications)

Abstract Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle: a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations. Our framework introduces several innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time, such as: Thermodynamic Permutations for data augmentation, structural amino acid embeddings to model a mutation with a single structure, a protein structure-specific attention-bias mechanism that makes transformers a viable alternative to graph neural networks. We provide training/test splits that mitigate data leakage and ensure proper model evaluation. Furthermore, to examine our data engineering contributions, we fine-tune ESM2 representations (Prostata-IFML) and achieve SOTA for sequence-based models. Notably, Stability Oracle outperforms Prostata-IFML even though it was pretrained on 2000X less proteins and has 548X less parameters. Our framework establishes a path for fine-tuning structure-based transformers to virtually any phenotype, a necessary task for accelerating the development of protein-based biotechnologies.
more » « less
Full Text Available
Virtual Reality Pursuit: Using Individual Predispositions towards VR to Understand Perceptions of a Virtualized Workplace Team Experience

https://doi.org/10.3390/virtualworlds3040023

Sanchez, Diana R; McVeigh-Schultz, Joshua; Isbister, Katherine; Tran, Monica; Martinez, Kassidy; Dost, Marjan; Osborne, Anya; Diaz, Daniel; Farillas, Philip; Lang, Timothy; et al (October 2024, Virtual Worlds)

This study investigates how individual predispositions toward Virtual Reality (VR) affect user experiences in collaborative VR environments, particularly in workplace settings. By adapting the Video Game Pursuit Scale to measure VR predisposition, we aim to establish the reliability and validity of this adapted measure in assessing how personal characteristics influence engagement and interaction in VR. Two studies, the first correlational and the second quasi-experimental, were conducted to examine the impact of environmental features, specifically the differences between static and mobile VR platforms, on participants’ perceptions of time, presence, and task motivation. The findings indicate that individual differences in VR predisposition significantly influence user experiences in virtual environments with important implications for enhancing VR applications in training and team collaboration. This research contributes to the understanding of human–computer interaction in VR and offers valuable insights for organizations aiming to implement VR technologies effectively. The results highlight the importance of considering psychological factors in the design and deployment of VR systems, paving the way for future research in this rapidly evolving field.
more » « less
Full Text Available
Institute for Foundations of Machine Learning (IFML): Advancing AI systems that will transform our world

https://doi.org/10.1002/aaai.12163

Klivans, Adam; Dimakis, Alexandros G.; Grauman, Kristen; Tamir, Jonathan I.; Diaz, Daniel J.; Davidson, Karen (March 2024, AI Magazine)

Abstract The Institute for Foundations of Machine Learning (IFML) focuses on core foundational tools to power the next generation of machine learning models. Its research underpins the algorithms and data sets that make generative artificial intelligence (AI) more accurate and reliable. Headquartered at The University of Texas at Austin, IFML researchers collaborate across an ecosystem that spans University of Washington, Stanford, UCLA, Microsoft Research, the Santa Fe Institute, and Wichita State University. Over the past year, we have witnessed incredible breakthroughs in AI on topics that are at the heart of IFML's agenda, such as foundation models, LLMs, fine‐tuning, and diffusion with game‐changing applications influencing almost every area of science and technology. In this article, we seek to highlight seek to highlight the application of foundational machine learning research on key use‐inspired topics:Fairness in Imaging with Deep Learning: designing the correct metrics and algorithms to make deep networks less biased.Deep proteins: using foundational machine learning techniques to advance protein engineering and launch a biomanufacturing revolution.Sounds and Space for Audio‐Visual Learning: building agents capable of audio‐visual navigation in complex 3D environments via new data augmentations.Improving Speed and Robustness of Magnetic Resonance Imaging: using deep learning algorithms to develop fast and robust MRI methods for clinical diagnostic imaging.IFML is also responding to explosive industry demand for an AI‐capable workforce. We have launched an accessible, affordable, and scalable new degree program—the MSAI—that looks to wholly reshape the AI/ML workforce pipeline.
more » « less
Full Text Available
Asymmetric Synthesis of α-Chloroamides via Photoenzymatic Hydroalkylation of Olefins

https://doi.org/10.1021/jacs.4c00927

Liu, Yi; Bender, Sophie G; Sorigue, Damien; Diaz, Daniel J; Ellington, Andrew D; Mann, Greg; Allmendinger, Simon; Hyster, Todd K (March 2024, Journal of the American Chemical Society)

Full Text Available
Reliable edge machine learning hardware for scientific applications

https://doi.org/10.1109/VTS60656.2024.10538639

Baldi, Tommaso; Campos, Javier; Hawks, Ben; Ngadiuba, Jennifer; Tran, Nhan; Diaz, Daniel; Duarte, Javier; Kastner, Ryan; Meza, Andres; Quinnan, Melissa; et al (April 2024, IEEE)

Full Text Available
HHH whitepaper

https://doi.org/10.1140/epjc/s10052-024-13376-3

Abouabid, Hamza; Arhrib, Abdesslam; Arnold, Hannah; Azevedo, Duarte; Brigljevic, Vuko; Chen, Maggie; Diaz, Daniel; Duarte, Javier; du_Pree, Tristan; El_Falaki, Jaouad; et al (November 2024, The European Physical Journal C)

We here report on the progress of the HHH Workshop, that took place in Dubrovnik in July 2023. After the discovery of a particle that complies with the properties of the Higgs boson of the Standard Model, all Standard Model (SM) parameters are in principle determined. However, in order to verify or falsify the model, the full form of the potential has to be determined. This includes the measurement of the triple and quartic scalar couplings.
more » « less
Full Text Available
HotProtein: A Novel Framework for Protein Thermostability Prediction and Editing

Chen, Tianlong; Gong, Chengyue; Diaz, Daniel J; Chen, Xuxi; Wells, Jordan T; Liu, Qiang; Wang, Zhangyang; Ellington, Andrew D; Dimakis, Alexandros G; Klivans, Adam (February 2023, ICLR 2023 https://openreview.net/forum?id=YDJRFWBMNby)

The molecular basis of protein thermal stability is only partially understood and has major significance for drug and vaccine discovery. The lack of datasets and standardized benchmarks considerably limits learning-based discovery methods. We present \texttt{HotProtein}, a large-scale protein dataset with \textit{growth temperature} annotations of thermostability, containing K amino acid sequences and K folded structures from different species with a wide temperature range. Due to functional domain differences and data scarcity within each species, existing methods fail to generalize well on our dataset. We address this problem through a novel learning framework, consisting of () Protein structure-aware pre-training (SAP) which leverages 3D information to enhance sequence-based pre-training; () Factorized sparse tuning (FST) that utilizes low-rank and sparse priors as an implicit regularization, together with feature augmentations. Extensive empirical studies demonstrate that our framework improves thermostability prediction compared to other deep learning models. Finally, we introduce a novel editing algorithm to efficiently generate positive amino acid mutations that improve thermostability. Codes are available in https://github.com/VITA-Group/HotProtein.
more » « less
Full Text Available

« Prev Next »

Search for: All records